

** file to create clusters of pollution monitors to replicate Deryugina et al specification


***Inputs:
* $Data/aqs_sites.dta
* $Data/aqs_sites_dailyid.dta
* $Data/UserMonitorMatch.dta

***Outputs: 
* $Data/AQS_clusters_distance_seed.dta


************** pollution by distance
set seed 17012023 

 use $Data/aqs_sites.dta, clear
 
 drop if site_close< td(01jan2012)
 
 keep statecode countycode sitenumber latitude longitude 
 
 keep if longitude<. & latitude<. 

 merge 1:1 statecode countycode sitenumber using $Data/aqs_sites_dailyid.dta
 
 drop _merge
 
 keep if id<. 
 
 merge 1:m id using $Data/UserMonitorMatch.dta

 keep if _merge ==3 // just keep matched stations 
 
 drop _merge 

 keep id latitude longitude 
 
 duplicates drop
 

 
 cluster kmeans latitude longitude, k(100)
 cluster kmeans latitude longitude, k(75)
 cluster kmeans latitude longitude, k(125)
 
 
 merge 1:m id using $Data/UserMonitorMatch.dta
 
 keep if _merge ==3
 drop _merge 
 
 by zip, sort: egen monitorcluster = mode(_clus_1), maxmode
 by zip, sort: egen altmonitorcluster75 = mode(_clus_2), maxmode
 by zip, sort: egen altmonitorcluster125 = mode(_clus_3), maxmode

 
 
 keep zip monitorcluster altmonitorcluster* 
 
 
 
 duplicates drop 
 save $Data/AQS_clusters_distance_seed.dta, replace

 
 
 
 
 
